Skip to content

GH-38007: [C++] Add VariableShapeTensor implementation#38008

Open
rok wants to merge 62 commits intoapache:mainfrom
rok:38007
Open

GH-38007: [C++] Add VariableShapeTensor implementation#38008
rok wants to merge 62 commits intoapache:mainfrom
rok:38007

Conversation

@rok
Copy link
Member

@rok rok commented Oct 4, 2023

Rationale for this change

We want to add VariableShapeTensor extension type definition for arrays containing tensors with variable shapes.

What changes are included in this PR?

This adds a C++ implementation.

Are these changes tested?

Yes.

Are there any user-facing changes?

This adds a new extension type C++.

@AlenkaF
Copy link
Member

AlenkaF commented Nov 16, 2023

Hi @rok! It would be great to have this in 15.0.0 (added the milestone for visibility). Do you think you have enough bandwidth to work on it at the moment?

@rok
Copy link
Member Author

rok commented Nov 16, 2023

@AlenkaF yes I'd very much like to continue working on this, I was waiting for the release before chasing reviewers :D
I'll rebase tonight and would very much appreciate reviews afterwards.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 23, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Nov 28, 2023
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I haven't looked at the tests too closely.

if(ARROW_TESTING)
add_library(arrow_compute_kernels_testing OBJECT test_util_internal.cc)
add_library(arrow_compute_kernels_testing OBJECT
test_util_internal.cc ../../extension/tensor_extension_array_test.cc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? This doesn't sound logical.

return Status::OK();
}
ARROW_EXPORT
Status ComputeStrides(const std::shared_ptr<DataType>& value_type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could return Result<std::vector<int64_t>> for better ergonomics

const auto size = static_cast<int64_t>(permutation.size());
std::vector<uint8_t> dim_seen(size, 0);
ARROW_EXPORT
Status IsPermutationValid(const std::vector<int64_t>& permutation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper functions all could take std::span<const int64_t> as inputs, though not important either.

std::vector<int64_t>* strides) {
auto fixed_width_type = internal::checked_pointer_cast<FixedWidthType>(value_type);
if (permutation.empty()) {
return internal::ComputeRowMajorStrides(*fixed_width_type.get(), shape, strides);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is doing the exact same thing except no permutation, please reconcile the two functions instead of duplicating code.

return false;
}

auto is_permutation_trivial = [](const std::vector<int64_t>& permutation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please don't duplicate code like this, factor out into common helpers.

Comment on lines +170 to +172
for (const auto& x : document["permutation"].GetArray()) {
permutation.emplace_back(x.GetInt64());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if "permutation" is not an array of ints?

)
])
def test_tensor_type_str(tensor_type, text):
def test_tensor_type_str(tensor_type, text, pickle_module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't seem useful?

assert result.to_tensor().shape == (1, 3, 2, 2)
assert result.to_tensor().strides == (12 * bw, 1 * bw, 6 * bw, 2 * bw)

tensor_type = pa.fixed_shape_tensor(arrow_type, [2, 2, 3], permutation=[2, 1, 0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's unrelated but it's testing something useful anyway?

const auto start_position = data_array->offset() * byte_width;
const auto size = std::accumulate(shape.begin(), shape.end(), static_cast<int64_t>(1),
std::multiplies<>());
ARROW_CHECK_EQ(size * byte_width, data_array->length() * byte_width);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be evidently simplified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Python] Add VariableShapeTensor implementation

6 participants

Comments